Search CORE

21 research outputs found

Image-Based Query by Example Using MPEG-7 Visual Descriptors

Author: Ventura Royo Carles
Publication venue: Escola Tècnica Superior de Telecomunicació de Barcelona
Publication date: 01/01/2010
Field of study

This project presents the design and implementation of a Content-Based Image Retrieval (CBIR) system where queries are formulated by visual examples through a graphical interface. Visual descriptors and similarity measures implemented in this work followed mainly those defined in the MPEG-7 standard although, when necessary, extensions are proposed. Despite the fact that this is an image-based system, all the proposed descriptors have been implemented for both image and region queries, allowing the future system upgrade to support region-based queries. This way, even a contour shape descriptor has been developed, which has no sense for the whole image. The system has been assessed on different benchmark databases; namely, MPEG-7 Common Color Dataset, and Corel Dataset. The evaluation has been performed for isolated descriptors as well as for combinations of them. The strategy studied in this work to gather the information obtained from the whole set of computed descriptors is weighting the rank list for each isolated descriptor

Tools for image retrieval in large multimedia databases

Author: Ventura Royo Carles
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2011
Field of study

English: One of the challenges in the development of an image retrieval system is to achieve an efficient indexing scheme since both developers and users, who are used to make requests in order to find a multimedia element in a large database, can be frustrated due to the long computational time of the search. The traditional indexing schemes neither fulfill the dynamic indexing requirement,which allows to add or remove elements from the structure, nor fit well in high dimensional feature spaces due to the phenomenon so called "the curse of dimensionality." After analyzing several indexing techniques from the literature, we have decided to implement an indexing scheme called Hierarchical Cellular Tree (HCT), which was designed to bring an effective solution especially for indexing large multimedia databases. The HCT has allowed to improve the performance of our implemented image retrieval system based on the MPEG-7 visual descriptors. We have also made some contributions by proposing some modifications to the original HCT which have resulted in an improvement of its performance. Thus, we have proposed a redefinition of the covering radius, which does not consider only the elements belonging to the cell, but also all the elements holding from that cell. Since this consideration implies a much more computationally costly algorithm, we have proposed an approximation by excess for the covering radius value. However, we have also implemented a method which allows to update the covering radius to its actual value whenever it is desired. In addition to this, the pre-emptive insertion method has been adapted as a searching technique in order to improve the performance given by the retrieval scheme called Progressive Query, which was originally proposed to be used over the HCT. Furthermore, the HCT indexing scheme has been also adapted to a server/client architecture by using a messenger system called KSC, which allows to have the HCTloaded on a server waiting for the query requests which are launched for the several clients of the retrieval system. In addition to this, the tool used to request a search over the indexed database has been adapted to a graphic user interface, named GOS (Graphic Object Searcher), which allos the user to order the retrievals in a more friendly way.Castellano: Uno de los desafíos en en desarrollo de un sistema de búsqueda de imágenes es lograr un esquema de indexación eficiente ya que tanto desarrolladores como usuarios, quienes suelen hacer búsquedas de elementos multimedia en grandes bases de datos, pueden verse frustrados por el largo tiempo de búsqueda. Los esquemas tradicionales de indexación no cumplen el requisito de una indexación dinámica que permita añadir y eliminar elementos de la estructura ni son eficientes en espacios de características de alta dimensionalidad. Después de haber analizado distintas técnicas de indexación, hemos decidido implementar un esquema de indexación llamado Hierarchical Cellular Tree (HCT). Éste fue diseñado para dar una solución efectiva a la indexación de grandes bases de datos multimedia. El HCT ha permitido mejorar el rendimiento de nuestro sistema de búsqueda de imágenes basado en descriptores visuales MPEG-7. También hemos hecho varias contribuciones proponiendo alguna modificaciones sobre el HCT original que han resultado en una mejora del rendimiento. En efecto, hemos propuesto una redefinición del radio de cobertura que no considere únicamente los elementos pertenecientes a la célula sinó también todos los elementos que cuelgan de ella. Como que esta consideración implica un algoritmo mucho más costoso computacionalmente, hemos propuesto una aproximación por exceso para el valor del radio de cobertura. No obstante, también hemos implementado un método que permite actualizar el radio de cobertura a su valor exacte siempre que se quiera. Además, el método de inserción preventivo ha sido adaptado como método de búsqueda para así mejorar el rendimiento dado por el esquema de búsqueda Progressive Query propuesto originariamente para el HCT. Además, el HCT ha siso adaptado a una arquitectura servidor/cliente utilizando un sistema de mensajería llamado KSC que permite cargar el HCT en un servidor esperando las peticiones de búsqueda de los distintos clientes del sistema de búsqueda. Asimismo, la herramienta utilizada para lanzar las peticiones de búsqueda ha sido adaptada a una interfaz de usuario gráfica llamada GOS (Graphic Object Searcher) que permite al usuario ordenar las búsquedas de forma más amigable.Català: Un dels reptes en el desenvolupament d'un sistema de cerca d'imatges és aconseguir un esquema d'indexació eficient ja que tant desenvolupadors com usuaris, els quals solen fer cerques per trobar un element multimèdia en una gran base de dades, es poden veure frustrats com a conseqüència del llarg temps de cerca. Els esquemes tradicionals d'indexació no compleixen el requeriment d'una indexació dinàmica, que permeti afegir i treure elements de l'estructura, ni són eficients en espais de característiques d'alta dimensionalitat. Després d'haver analitzat diverses tècniques d'indexació, hem decidit implementar un esquema d'indexació anomenat Hierarchical Cellular Tree (HCT), el qual va ser dissenyat per donar una solució efectiva a la indexació de grans bases de dades multimèdia. El HCT ha permès millorar el rendiment del nostre sistema de cerca d'imatges basat en descriptors visuals MPEG-7. També hem fet diverses contribucions proposant algunes modificacions al HCT original que han resultat en una millora del seu rendiment. En efecte, hem proposat una redefinició del radi de cobertura que no considera únicament els elements pertanyents a la pròpia cèl·lula, sinó també tots els elements que pengen d'aquesta cèl·lula. Com que aquesta consideració implica un algorisme molt més costós computacionalment, hem proposat una aproximació per excés per al valor del radi de cobertura. No obstant això, hem implementat també un mètode que permet actualitzar el radi de cobertura al seu valor exacte sempre que es vulgui. A més a més, el mètode d'inserció preventiu ha estat adaptat com a mètode de cerca per tal de millorar el rendiment donat per l'esquema de cerca anomenat Progressive Query, el qual va ser originàriament proposat per a ser utilitzat sobre el HCT. A més, s'ha adaptat el HCT a una arquitectura de client/servidor utilitzant un sistema de missatgeria anomenat KSC, el qual permet tenir el HCT carregat en un servidor esperant noves peticions de cerca llançades pels clients del sistema de cerca. L'eina utilitzada per fer les peticions de cerca ha estat adaptada a una interfície d'usuari gràfica anomenada GOS (Graphic Object Searcher) que permet a l'usuari fer les cerques de forma més amigable

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Image-Based Query by Example Using MPEG-7 Visual Descriptors

Author: Ventura Royo Carles
Publication venue: Escola Tècnica Superior de Telecomunicació de Barcelona
Publication date: 01/01/2010
Field of study

UPCommons. Portal del coneixement obert de la UPC

UPC at MediaEval 2013 Hyperlinking Task

Author: Giró Nieto Xavier
Tella-Amo Marcel
Ventura Royo Carles
Publication venue: CEUR Workshop Proceedings
Publication date: 01/01/2013
Field of study

These working notes paper present the contribution of the UPC team to the Hyperlinking sub-task of the Search and Hyperlinking Task in MediaEval 2013. Our contribution ex- plores the potential of a solution based only on visual cues. In particular, every automatically generated shot is repre- sented by a keyframe. The linking between video segments is based on the visual similarity of the keyframes they contain. Visual similarity is assessed with the intersection of bag of features histograms generated with the SURF descriptor.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

RVOS: end-to-end recurrent network for video object segmentation

Author: Bellver Bueno Míriam
Girbau Xalabarder Andreu
Giró Nieto Xavier
Marqués Acosta Fernando
Salvador Aguilera Amaia
Ventura Royo Carles
Publication venue: Computer Vision Foundation
Publication date: 01/01/2019
Field of study

Multiple object video object segmentation is a challenging task, specially for the zero-shot case, when no object mask is given at the initial frame and the model has to find the objects to be segmented along the sequence. In our work, we propose a Recurrent network for multiple object Video Object Segmentation (RVOS) that is fully end-to-end trainable. Our model incorporates recurrence on two different domains: (i) the spatial, which allows to discover the different object instances within a frame, and (ii) the temporal, which allows to keep the coherence of the segmented objects along time. We train RVOS for zero-shot video object segmentation and are the first ones to report quantitative results for DAVIS-2017 and YouTube-VOS benchmarks. Further, we adapt RVOS for one-shot video object segmentation by using the masks obtained in previous time steps as inputs to be processed by the recurrent module. Our model reaches comparable results to state-of-the-art techniques in YouTube-VOS benchmark and outperforms all previous video object segmentation methods not using online learning in the DAVIS-2017 benchmark. Moreover, our model achieves faster inference runtimes than previous methods, reaching 44ms/frame on a P100 GPU.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

A closer look at referring expressions for video object segmentation

Author: Bellver Bueno Míriam
Giró Nieto Xavier
Kazakos Ioannis
Silberer Carina
Torres Viñals Jordi
Ventura Royo Carles
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/07/2022
Field of study

The task of Language-guided Video Object Segmentation (LVOS) aims at generating binary masks for an object referred by a linguistic expression. When this expression unambiguously describes an object in the scene, it is named referring expression (RE). Our work argues that existing benchmarks used for LVOS are mainly composed of trivial cases, in which referents can be identified with simple phrases. Our analysis relies on a new categorization of the referring expressions in the DAVIS-2017 and Actor-Action datasets into trivial and non-trivial REs, where the non-trivial REs are further annotated with seven RE semantic categories. We leverage these data to analyze the performance of RefVOS, a novel neural network that obtains competitive results for the task of language-guided image segmentation and state of the art results for LVOS. Our study indicates that the major challenges for the task are related to understanding motion and static actions.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work was partially supported by the projects PID2019-107255GB-C22 and PID2020-117142GB-I00 funded by MCIN/ AEI /10.13039/501100011033 Spanish Ministry of Science, and the grant 2017-SGR-1414 of the Government of Catalonia. This work was also partially supported by the project RTI2018-095232-B-C22 funded by the Spanish Ministry of Science, Innovation and Universities.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Digital.CSIC

RVOS: end-to-end recurrent network for video object segmentation

Author: Bellver Míriam
Girbau Xalabarder Andreu
Giró Nieto Xavier
Marqués Acosta Fernando
Salvador Aguilera Amaia
Ventura Royo Carles
Publication venue
Publication date: 15/06/2019
Field of study

UPCommons. Portal del coneixement obert de la UPC

Visual object analysis using regions and local features

Author: Ventura Royo Carles
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2016
Field of study

The first part of this dissertation focuses on an analysis of the spatial context in semantic image segmentation. First, we review how spatial context has been tackled in the literature by local features and spatial aggregation techniques. From a discussion about whether the context is beneficial or not for object recognition, we extend a Figure-Border-Ground segmentation for local feature aggregation with ground truth annotations to a more realistic scenario where object proposals techniques are used instead. Whereas the Figure and Ground regions represent the object and the surround respectively, the Border is a region around the object contour, which is found to be the region with the richest contextual information for object recognition. Furthermore, we propose a new contour-based spatial aggregation technique of the local features within the object region by a division of the region into four subregions. Both contributions have been tested on a semantic segmentation benchmark with a combination of free and non-free context local features that allows the models automatically learn whether the context is beneficial or not for each semantic category. The second part of this dissertation addresses the semantic segmentation for a set of closely-related images from an uncalibrated multiview scenario. State-of-the-art semantic segmentation algorithms fail on correctly segmenting the objects from some viewpoints when the techniques are independently applied to each viewpoint image. The lack of large annotations available for multiview segmentation do not allow to obtain a proper model that is robust to viewpoint changes. In this second part, we exploit the spatial correlation that exists between the different viewpoints images to obtain a more robust semantic segmentation. First, we review the state-of-the-art co-clustering, co-segmentation and video segmentation techniques that aim to segment the set of images in a generic way, i.e. without considering semantics. Then, a new architecture that considers motion information nd provides a multiresolution segmentation is proposed for the co-clustering framework nd outperforms state-of-the-art techniques for generic multiview segmentation. Finally, the proposed multiview segmentation is combined with the semantic segmentation results giving a method for automatic resolution selection and a coherent semantic multiview segmentation.La primera part de la tesi es focalitza en l'anàlisi del context espacial en la segmentació semàntica d'imatges. En primer lloc, revisem com s'ha tractat el context espacial en la literatura per mitjà de descriptors locals i tècniques d'agregació espacial. A partir de la discussió sobre si el context és beneficial o no per al reconeixement d'objectes, extenem una segmentació en objecte, contorn i fons per a l'agregació espacial de descriptors locals amb annotacions a un escenari més realístic on s'utilitzen hipòtesis de localitzacions d'objectes enlloc d'annotacions. Mentres que les regions corresponen a objecte i fons representes aquestes àrees respectives de la imatge, el contorn és una regió al voltant de l'objecte, la qual ha resultat ser la regió més rica amb informació contextual per al reconeixement d'objectes. A més a més, proposem una nova tècnica d'agregació espacial dels descriptors locals de l'interior de l'objecte amb una divisió d'aquesta regió en 4 subregions. Ambdues contribucions han estat verificades en un benchmark de segmentació semàntica amb la combinació de descriptors locals dependents i independents del context que permet que els models automàticament aprenguin si el context és beneficiós o no per a cada categoria semàntica. La segona part de la tesi aborda el problema de segmentació semàntica per a un conjunt d'imatges relacionades en un escenari multi-vista sense calibració. Els algorismes de l'estat de l'art en segmentació semàntica fallen en segmentar correctament els objects dels diferents punts de vista quan les tècniques són aplicades de forma independent a cadascun dels punts de vista. La manca d'un nombre elevat d'annotacions disponibles per a segmentació multi-vista no permet obtenir un model que sigui robust als canvis de vista. En aquesta segona part, explotem la correlació espacial existent entre els diferents punts de vista per obtenir una segmentació semàntica més robusta. En primer lloc, revisem les tècniques de l'estat de l'art en co-agrupament, co-segmentació i segmentació de vídeo que tenen per objectiu segmentar el conjunt d'imatges de forma genèrica, és a dir, sense considerar la semàntica. A continuació, proposem una nova arquitectura de co-agrupament que considera informació de moviment i proveeix una segmentació amb múltiples resolucions i millora les tècniques de l'estat de l'art en segmentació genèrica multi-vista. Finalment, la segmentació multivista proposada és combinada amb els resultats de la segmentació semàntica donant lloc a un mètode per a una selecció automàtica de la resolució i una segmentació semàntica multi-vista coherent.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa